.. code:: ipython3

    import json
    import os
    import pandas as pd
    import shutil
    from seeq import spy
    
    # Set the compatibility option so that you maximize the chance that SPy will remain compatible with your notebook/script
    spy.options.compatibility = 192

.. code:: ipython3

    # Log into Seeq Server if you're not using Seeq Data Lab:
    spy.login(url='http://localhost:34216', credentials_file='../credentials.key', force=False)

Workbook Jobs
=============

In :doc:`spy.workbooks.ipynb <spy.workbooks>`, you can learn to push
and pull workbooks (Workbench Analyses and Organizer Topics) to/from the
Seeq service/server using SPy.

You may need to do something “in bulk,” in one of the following
scenarios:

-  Re-mapping references (e.g. historian tags/signals) from one
   datasource to another, or one asset tree to another
-  Transferring work from one Seeq service/server to another, possibly
   including data

The set of functions in the ``spy.workbooks.job`` module are suitable
for this work. Each function operates within a “job folder” that
captures the state of the job. Unlike ``spy.workbooks.pull()`` and
``spy.workbooks.push()``, the equivalent commands in
``spy.workbooks.job`` do not require all workbooks to be held in memory.
This allows very large jobs to be executed (as long as there is
sufficient disk space). All parts of the process are *resumable*, and
SPy will pick up where it left off if the operation is interrupted for
any reason (e.g. a network error).

This notebook will walk through the use of this module, referencing the
scenarios above. In general, commands are executed in the following
order:

1. ``spy.workbooks.job.pull()``
2. ``spy.workbooks.job.data.pull()`` (optional)
3. ``spy.workbooks.job.push()``
4. ``spy.workbooks.job.data.push()`` (optional)

Establish the Job Folder
------------------------

The parameter that defines a job is a *job folder*. It is the first
argument for all job functions, and it is managed entirely by SPy. The
folder is laid out in an intuitive way that allows you to inspect it,
and, in some troubleshooting cases, make modifications yourself.

.. code:: ipython3

    job_folder = 'Output/My First Workbooks Job'
    
    # Remove the job folder so that old file/artifacts don't affect the tutorial
    if os.path.exists(job_folder):
        shutil.rmtree(job_folder)

Let’s Make Something to Work With…
----------------------------------

We need some Analyses/Topics to work with for the purposes of
demonstrating the functionality, so let’s make sure the example
workbooks have been pushed.

.. code:: ipython3

    example_workbooks = spy.workbooks.load('Support Files/Example Export.zip')
    spy.workbooks.push(example_workbooks,
                       path='SPy Documentation Examples >> Workbook Job Import',
                       label=f'{spy.session.user.name} Workbook Job Example', 
                       refresh=False, 
                       errors='raise')

Pulling Workbooks
-----------------

Start the job cycle by issuing the ``spy.workbooks.job.pull()`` to grab
a set of workbooks and write them to disk.

As with ``spy.workbooks.pull()``, we create a DataFrame full of
workbooks to pull by using the ``spy.workbooks.search()`` function. Then
we can supply that DataFrame to ``spy.workbooks.job.pull()``, which
takes many of the same parameters as ``spy.workbooks.pull()``.

.. code:: ipython3

    workbooks_df = spy.workbooks.search({
        'Path': 'SPy Documentation Examples >> Workbook Job Import'
    })
    
    # Store these in variables that we'll use later
    example_analysis_workbook_id = workbooks_df[workbooks_df['Name'] == 'Example Analysis'].iloc[0]['ID']
    example_topic_workbook_id = workbooks_df[workbooks_df['Name'] == 'Example Topic'].iloc[0]['ID']
    
    workbooks_df

.. code:: ipython3

    spy.workbooks.job.pull(job_folder, workbooks_df)

As mentioned earlier, jobs are *resumable*. If you execute the above
cell again, you will see that the **Result** column indicates
``Already pulled``.

If you would like to force a job to redo its work, supply the
``resume=False`` argument. You can also inspect the job folder’s
``Workbooks`` subfolder and selectively delete workbook folders therein
to force SPy to re-pull workbooks.

Pushing Workbooks
-----------------

As mentioned above, there are two primary scenarios where you want to
push workbooks in bulk:

-  Re-mapping references (e.g. historian tags/signals) from one
   datasource to another, or one asset tree to another
-  Transferring work from one Seeq service/server to another, possibly
   including data

Datasource Maps
~~~~~~~~~~~~~~~

In either case, it’s important to understand the concept of *datasource
maps*. These are JSON files that contain instructions for SPy as it maps
the identifiers in the pulled workbook definitions to identifiers on the
target system. These maps can incorporate relatively complex Regular
Expression specifications that allow you to re-orient workbooks from one
set of input data to another.

The ``spy.workbooks.job.pull()`` command will create a
``Datasource Maps`` folder inside the job folder. There will be one file
for every datasource that was encountered during the pull operation – if
a workbook touched a datasource in some way, there will be a file for
it.

Here’s what a typical file looks like:

::

   {
       "Datasource Class": "Time Series CSV Files",
       "Datasource ID": "Example Data",
       "Datasource Name": "Example Data",
       "Item-Level Map Files": [],
       "RegEx-Based Maps": [
           {
               "Old": {
                   "Type": "(?<type>.*)",
                   "Datasource Class": "Time Series CSV Files",
                   "Datasource Name": "Example Data",
                   "Data ID": "(?<data_id>.*)"
               },
               "New": {
                   "Type": "${type}",
                   "Datasource Class": "Time Series CSV Files",
                   "Datasource Name": "Example Data",
                   "Data ID": "${data_id}"
               }
           }
       ]
   }

You can make modifications to these files by loading them into an
editor, including Jupyter’s text editor. Generally the most common
action is to add or change entries in the ``RegEx-Based Maps`` block.
That section is a *list* of *dictionaries* that each have an ``Old`` and
a ``New`` subsection. Within the ``Old`` block, you can specify
properties to match on. The key is the property name and the value is a
`regular
expression <https://en.wikipedia.org/wiki/Regular_expression>`__, often
employing a *capture group*. In the example above, the ``Data ID`` field
is matching using the ``.*`` regex and storing it in a capture group
called ``data_id``. The ``New`` block then contains the properties and
values to search upon to “map” to target items. In the example above,
the ``"Data ID": "${data_id}"`` specification just means that the Data
ID is being used “as-is” without any alteration.

(If you happen to be familiar with `Connector Property
Transforms <https://telemetry.seeq.com/support-link/kb/latest/cloud/connector-property-transforms>`__,
this regex approach may feel familiar.)

Let’s look at a more complicated example:

::

   {
       "Datasource Class": "Time Series CSV Files",
       "Datasource ID": "Example Data",
       "Datasource Name": "Example Data",
       "Item-Level Map Files": [],
       "RegEx-Based Maps": [
           {
               "Old": {
                   "Type": "(?<type>.*)",
                   "Datasource Class": "Time Series CSV Files",
                   "Datasource Name": "Example Data",
                   "Data ID": "(?<data_id>.*)"
               },
               "New": {
                   "Type": "${type}",
                   "Datasource Class": "Time Series CSV Files",
                   "Datasource Name": "Example Data",
                   "Data ID": "${data_id}"
               }
           },
           {
               "Old": {
                   "Type": "(?<type>.*)",
                   "Path": "Example >> Cooling Tower 1",
                   "Asset": "Area (?<subarea>[ABC])",
                   "Name": "(?<name>.*)"
               },
               "New": {
                   "Type": "${type}",
                   "Path": "Example >> Cooling Tower 2",
                   "Asset": "Area ${subarea}",
                   "Name": "${name}"
               }
           }
       ]
   }

In this example there are two RegEx-Based Maps specified. The first map
is identical to the previous example, and it will be used first– if
there is not a match on the ``Old`` regex specifications, then SPy will
move on to the next. The next map matches on a particular asset path
(``Example >> Cooling Tower 1``) and a set of subareas (``A``, ``B``, or
``C``) and then maps them to the same area underneath
``Example >> Cooling Tower 2``.

In this manner, you can use arbitrarily-complex mapping logic to
accomplish the goal of re-mapping a workbook within the same Seeq server
or properly mapping from one Seeq server to another.

Datasource Mapping in Action
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Let’s run through an actual mapping scenario to see how it works and how
to troubleshoot it when it goes wrong.

First we have to grab a couple of signal IDs so that we can use them
later to illustrate some functionality.

.. code:: ipython3

    area_a_temperature_id = spy.search({'Datasource Name': 'Example Data', 'Name': 'Area A_Temperature'}).iloc[0]['ID']
    area_a_optimizer_id = spy.search({'Datasource Name': 'Example Data', 'Name': 'Area A_Optimizer'}).iloc[0]['ID']

Now we will write out a datasource map file that has, as its first map,
a ``New`` block that will map to a ``Name`` that does not exist. This
will let us see what happens both when the mapping is successful and
when there are errors.

.. code:: ipython3

    datasource_map = {
        "Datasource Class": "Time Series CSV Files",
        "Datasource ID": "Example Data",
        "Datasource Name": "Example Data",
        "Item-Level Map Files": [],
        "RegEx-Based Maps": [
            {
                "Old": {
                    "Type": "(?<type>.*)",
                    "Datasource Class": "Time Series CSV Files",
                    "Datasource Name": "Example Data",
                    "Name": "Area A_Optimizer"
                },
                "New": {
                    "Type": "${type}",
                    "Datasource Class": "Time Series CSV Files",
                    "Datasource Name": "Example Data",
                    "Name": "Area NonExistent_Optimizer"
                },
                # In this contrived example, if we match on the "Old" criteria, we don't want to continue to the next regex map
                "On Match": "Stop"
            },
            {
                "Old": {
                    "Type": "(?<type>.*)",
                    "Datasource Class": "Time Series CSV Files",
                    "Datasource Name": "Example Data",
                    "Data ID": "(?<data_id>.*)"
                },
                "New": {
                    "Type": "${type}",
                    "Datasource Class": "Time Series CSV Files",
                    "Datasource Name": "Example Data",
                    "Data ID": "${data_id}"
                }        
            }
        ]
    }
    
    with open(os.path.join(job_folder, 'Datasource Maps', 'Datasource_Map_Time Series CSV Files_Example Data_Example Data.json'), 'w') as f:
        json.dump(datasource_map, f)

Now we push to server using a label that is guaranteed to differentiate
our activity from other users. As you will see, there will be errors
reported in the ``Results`` column because one item won’t be mapped.

.. code:: ipython3

    push_df = spy.workbooks.job.push(job_folder,
                                     path='SPy Documentation Examples >> Workbook Jobs',
                                     label=f'{spy.session.user.name} Workbook Job Example',
                                     errors='catalog')

You can see the error, but it’s not formatted very well. So let’s use a
troubleshooting tool– the “explain” function on the returned DataFrame:

.. code:: ipython3

    print(push_df.spy.item_map.explain(area_a_optimizer_id))

This detailed explanation is intended to give you a starting point for
troubleshooting. You can see the **regex** that was specified, the
property values that were **matched on**, the **Capture groups** that
resulted from the RegEx specifications, and the property values that
were subsequently **searched for**. Since ``Area NonExistent_Optimizer``
does not exist, the explanation for ``RegEx-Based Map 0`` says *Item not
found on server*.

Now let’s look at the explanation for a successful map
(``Area A_Temperature``):

.. code:: ipython3

    print(push_df.spy.item_map.explain(area_a_temperature_id))

Dummy Items
~~~~~~~~~~~

In cases where we couldn’t map successfully, we can tell SPy to create
“dummy” items. A dummy item is a signal, condition or scalar that has
all the properties of the original item but has no data. (We’ll show how
to push data to dummy items later…)

Note the use of ``create_dummy_items=True``, and also note
``resume=False`` so that SPy tries to push the workbooks again:

.. code:: ipython3

    push_df = spy.workbooks.job.push(
        job_folder,
        path='SPy Documentation Examples >> Workbook Jobs',
        label=f'{spy.session.user.name} Workbook Job Example',
        create_dummy_items=True,
        resume=False,
        errors='catalog')

Now there are no errors, because any item that couldn’t be mapped would
be replaced by a dummy item.

Find the ``Example Analysis`` row of the output table above and click on
the *link* in the ``URL`` column to take a look at the resulting. You’ll
see that the **Details Pane** worksheet contains **Area A_Optimizer**,
which is a blank “dummy item”.

We can look at what dummy items were created by inspecting
``push_df.spy.item_map.dummy_items``. You can see that the ``Name`` is
the same as the original and important properties like
``Maximum Interpolation`` have made their way to the dummy item.

.. code:: ipython3

    push_df.spy.item_map.dummy_items

Including Data
--------------

Dummy items are helpful, but they are “blank,” they do not have any data
associated with them. If you are transferring workbooks between servers
and the destination server doesn’t have access to the same datasources,
it is useful to be able transfer the data itself from the source server
to the dummy items on the destination server. A set of SPy functions is
provided in ``spy.workbooks.job.data`` for this purpose.

As ``spy.workbooks.job.pull()`` pulls workbook information, it also
tracks the usage of data items on Workbench Worksheets and in Organizer
Topic Documents. This information is collated and saved to disk as the
*data manifest*. You can inspect the manifest like so:

.. code:: ipython3

    manifest_df = spy.workbooks.job.data.manifest(job_folder)
    
    # Simplify the DataFrame so that it fits on the screen better
    manifest_df[['ID', 'Path', 'Asset', 'Name', 'Start', 'End', 'Calculation']]

You can see ``Start`` and ``End`` columns that provide the overall time
bounds that were detected in the workbook data references. There is also
a ``Calculation`` column that refines this broad time period into
specific “chunks” of data, defined as individual capsules and using the
``within()`` and ``touches()`` Seeq Formula functions to pull data only
for those time periods – not just everything between ``Start`` and
``End``.

This manifest DataFrame can be fed directly into ``spy.pull()`` but it
is recommended that you use ``spy.workbooks.job.data.pull()`` like so:

.. code:: ipython3

    spy.workbooks.job.data.pull(job_folder)

Data has now been added to the job folder for the time periods
identified in the manifest and can be pushed to the dummy items like so:

.. code:: ipython3

    spy.workbooks.job.data.push(job_folder)

If you refresh the page of the **Details Pane** within the **Example
Analysis**, you’ll now see data for **Area A_Optimizer**.

However, if you move the Display Range to the left, you’ll see that
**Area A_Optimizer** data only exists for the time period that was
originally on the screen.

If you want to expand how much data is pulled for a particular item, you
can execute a command to alter the manifest and then pull/push again:

.. code:: ipython3

    # Expand the time periods for Area A_Optimizer by 2 weeks on either side
    spy.workbooks.job.data.expand(job_folder, {'Name': 'Area A_Optimizer'}, by='2w')
    
    spy.workbooks.job.data.pull(job_folder, resume=False)
    spy.workbooks.job.data.push(job_folder, resume=False)

There are a series of functions to alter the manifest, and

-  ``spy.workbooks.job.data.expand()`` - Expand the existing time
   periods.
-  ``spy.workbooks.job.data.add()`` - Add a specific time period.
-  ``spy.workbooks.job.data.remove()`` - Remove a specific time period.
-  ``spy.workbooks.job.data.calculation()`` - Apply a specific
   calculation, such as resample().

Documentation for these functions is found under the **Detailed Help**
section below.

Redoing Specific Workbooks/Items
--------------------------------

In the process of pushing and pulling, you will usually use the
``errors='catalog'`` flag, which means that errors will be enumerated
but the operation will keep going if at all possible. When you resume an
operation, those items that had errors will not be re-attempted, because
SPy (by default) assumes that you don’t care about them.

But you will often care about errors, and you will figure out how to fix
them (say, by altering a Datasource Map). You can force SPy to redo a
push or pull operation for a particular item or set of items using the
``redo`` family of functions:

.. code:: ipython3

    spy.workbooks.job.redo(job_folder, example_analysis_workbook_id)

.. code:: ipython3

    spy.workbooks.job.data.redo(job_folder, area_a_optimizer_id)

Zip/Unzip the Job Folder
------------------------

If you are intending to transfer workbook information to another Seeq
server, it is convenient to package up the job folder as a zip file.
There are two functions for this purpose:

.. code:: ipython3

    spy.workbooks.job.zip(job_folder, overwrite=True)

.. code:: ipython3

    spy.workbooks.job.unzip(job_folder + '.zip', overwrite=True)

Detailed Help
-------------

All SPy functions have detailed documentation to help you use them. Just
execute ``help(spy.<func>)`` like you see below.

**Make sure you re-execute the cells below to see the latest
documentation. It otherwise might be from an earlier version of SPy.**

.. code:: ipython3

    help(spy.workbooks.job.pull)

.. code:: ipython3

    help(spy.workbooks.job.push)

.. code:: ipython3

    help(spy.workbooks.job.data.pull)

.. code:: ipython3

    help(spy.workbooks.job.data.manifest)
    help(spy.workbooks.job.data.expand)
    help(spy.workbooks.job.data.add)
    help(spy.workbooks.job.data.remove)
    help(spy.workbooks.job.data.calculation)

.. code:: ipython3

    help(spy.workbooks.job.data.push)

.. code:: ipython3

    help(spy.workbooks.job.redo)
    help(spy.workbooks.job.data.redo)